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ABSTRACT 



Training courses organized by the World Bank Institute (WBI) 
have recently started to assess participant learning using a randomized, 
cognitive pretest-posttest . Some trainers, however, feel reluctant to use 
this Level 2 evaluation (D. Kirkpatrick, 1994) in their courses, and continue 
to rely on participants' self-assessment of their own knowledge, a more 
common, traditional approach, that has been used by the WBI. This study 
investigated whether participants' perceptions about what they have learned 
can be a valid proxy of what they have actually learned. In some cases 
self-assessment by participants was positively correlated with the amount of 
their actual learning. However, the correlation was too weak to enable 
researchers to rely on perceived self-reported data on learning to measure 
actual learning. Breaking down the data by gender, region, education, or 
years of related experience showed that none of the groups studied were 
consistently able to validly assess how much they learned in a course. 
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Abstract 

Training courses organized by the World Bank Institute (WBI) have recently started to 
assess participant learning using a randomized, cognitive pre-post test 1 . Some trainers, 
however, feel reluctant to use this Level 2 2 evaluation in their courses, and continue to 
rely on participants’ self-assessment of own knowledge — a more common traditional 
approach that has been used in WBI. This study investigated whether participants’ 
perceptions about what they have learned can be a valid proxy of what they have actually 
learned. We found that in some cases self-assessment by participants (i.e., trainees) was 
positively correlated with the amount of their actual learning. However, the correlation 
was too weak to enable us to rely on perceived self-reported data on learning to measure 
actual learning. Breaking down the data by gender, region, education or years of related 
experience showed that none of the groups studied were consistently able to validly 
assess how much they learned in a course. 

Introduction 

Traditionally, all training courses offered by WBI have been using some types of a course 
evaluation by participants to assess their performance. Until a few years ago, the most 
common form of course evaluation was simply an end-of-course evaluation form that 
asked participants for their reaction to the course. Then the idea of measuring participant 
learning was introduced, as part of the effort to assess the effectiveness of the Institute’s 
training courses on participants. 

Because of the initial resistance of most trainers to directly assess participants’ learning, 
the Institute first used a ‘post-then self-assessment’ evaluation method. This method 
consists of asking participants to assess their level of knowledge of the course topics, 
based on their perceptions. For this, participants complete a single questionnaire at the 
end of the course. They give two ratings for each topic of the course. With the first rating, 
participants retrospectively assess of how much they knew the topics just before the 
course. The second rating asks participants to self-assess their current knowledge of the 
course topics. 



1 In this paper, the words ‘test’ and ‘assessment of actual knowledge’ are used interchangeably. 

2 In this paper, evaluation levels refer to the four Levels of Kirkpatrick. Level 1 measures 
participants/trainees’ reactions to the training, and Level 2 their actual learning of the training materials. In 
our context, both measures are collected via questionnaires administered to the full group of participants at 
the training course. 
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This self-assessment method helped to make the idea of measuring learning more 
acceptable among WBI’s trainers. It eventually enabled the Evaluation Unit to implement 
a method that would objectively measure the levels of knowledge acquisition by 
participants. A ‘randomized, cognitive pre-post assessment of actual learning’ method 
was designed and introduced in 1998. 

In this Level 2 evaluation method, trainers develop a set of multiple-choice questions 
(usually 30). The Evaluation Unit randomly assigns them to two groups of questions. Half 
of the questions is answered by the participants at the beginning of the course, and the 
other half at the end of the course. The assessments are anonymous. Confidential numbers 
are used to match individual responses to the pre-course and post-course assessment. 
Participants are instructed to choose one response option per question. If they don’t know 
the answer, they should simply answer ‘I don’t know.’ This procedure is not a test of the 
participants’ knowledge, as it would be given in a university, but an assessment of the 
trainers’ ability to effectively convey the course contents to the participants. 

However, some trainers have expressed a reluctance to move to the Level 2 evaluation 
and have remained with a Level 1 evaluation, relying on participant ‘post-then self- 
assessment’ of knowledge. Their alleged major reason for being reluctant is that many 
courses target senior-level government officials from various countries, and the trainers 
simply don’t feel that it is appropriate to give a ‘test’ to this kind of senior-level audience 
at their courses. \ . 

At the Evaluation Unit of the Institute, the reluctance of some trainers to move to the 
Level 2 evaluation has been an issue. For those courses that used only the Level 1 
evaluation with a post-then method, we have been forced to base our judgement of 
participant learning solely on self-assessed knowledge gains reported by course 
participants. There are no data available for these courses that would help us objectively 
assess potential changes in participant knowledge. Despite this lack of data, some trainers 
started to report perceived learning gains obtained through self-assessment methods as if 
they were actual learning gains. For these reasons, we felt that it was important to find out 
whether or not self-reported data from participants can serve as a ‘proxy’ for measuring 
learning. 

We looked into the literature and found various studies that investigated the potential 
relationship between training attributes and/or outcomes, such as changes in work 
behaviors or knowledge test scores, and trainee characteristics or their reactions, e.g., 
Faerman and Ban (1992), Warr and Bunce (1995), Alliger et al (1997). While the studies 
all appear to conclude that trainee reactions, in particular, cannot be used as a substitute 
for actual learning, some of them djd report examples where positive correlations existed 
between trainee reactions and learning outcomes in their studies (e.g. Alliger et al). 

Alliger et al (1997) slightly modified the Kirkpatrick’s four levels of evaluation, and 
provided a more detailed breakdown of training criteria. They discussed a case where 
utility-type reaction measures (reactions to such questions as ‘To what degree will this 
training influence your ability later to perform your job?,’ ‘Was this training job 
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relevant?,’ and ‘Was the training of practical value?’) did correlate with immediate 
learning and on-the-job performance. They also found that this correlation was much 
stronger than that observed with affective-type reaction measures (e.g. ‘liking of 
training’). They reported that trainee reactions should not be used blindly for the 
assessment of actual learning, but utility questions can be used as the better estimate if the 
training criterion is carefully defined and developed. 

Nancy Dixon’s study in 1990 may be the closest study to ours. She asked the question: 
‘To what extent do perceptions of amount learned correlate with actual learning scores?’ 
Her results using post-test results of 1,200 employees of a large manufacturing company 
in the Southwest (US) was that there was no relationship between participants’ 
perceptions of how much they learned and their actual test scores. 

The literature being rather slim and inconclusive so far, we decided to use our data set 
(collected from training courses offered by WBI) to find out what it would tell us about 
the relationship between participant reactions and their actual learning. We approached 
this study from different perspectives, using three different methods: 

1) Comparison of participant ratings to the traditional end-of-course Level 1 question: 
‘Extent to which you have acquired information that is new to you’ with their scores 
on pre-post actual knowledge assessments; 

2) Comparison between learning gains reported by participants based on post-then self- 
assessment and learning gains observed on pre-post assessment of actual learning; and 

3) Observation of the participants’ responses to the pre-post actual knowledge 
assessment from the point of view of their ability to perceive whether or not they 
knew the answer to a question. For this, we compared the participants who correctly 
assessed their knowledge of the question by selecting either the right answer or by 
selecting ‘I don’t know’ with the participants who selected a wrong answer to the 
question. 



Evaluation Methods and Respective Preliminary Results 

First we need to emphasize the fact that our data set is small and that our preliminary 
findings need to be strengthened by additional data. Since we build our data according to 
what trainers want to find out on each specific course, no question (be it demographic, 
Level 1 or Level 2) is asked systematically. Consequently, breaking down our data by 
demographic categories most often results in very small samples. This is important to 
keep in mind while reading the results provided throughout this study. 

The study tried to answer the key question: ‘Can measuring people’s perceptions on 
learning be a valid proxy for measuring what they have learned?’ from the following 
three different methods. 
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